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, Abstract. Kossovsky recently conjectured that the distribution of leading digits of 

' a chain of probability distributions converges to Benford's law as the length of the 

, chain grows. We prove his conjecture in many cases, and provide an interpretation in 

j-h ■ terms of products of independent random variables and a central limit theorem. An 

^ , interesting consequence is that in hierarchical Bayesian models priors tend to satisfy 

Benford's Law as the number of levels of the hierarchy increases, which allows us to 
I/"") \ develop some simple tests (based on Benford's law) to test proposed models. We give 

explicit formulas for the error terms as sums of Mellin transforms, which converges 
extremely rapidly as the number of terms in the chain grows. We may interpret our 
results as showing that certain Markov chain Monte Carlo processes are rapidly mixing 
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to Benford's law. 



1. Introduction 

The distribution of leading digits of numbers in data sets has intrigued researchers 
for over 100 years. Using scientific notation (base B), for any x > we may write 



x = MB(x)B k , where k E Z and Mb(x) is the mantissa of x base B. We say the 
data follows Benford's law if the probability of having a mantissa of at most s is log B s. 
This implies that the probability of observing a first digit of d base B is log B (1 + 1/d); 
^ ! in particular, about 30% of the time the first digit is 1 base 10 (and not 11% as one 

might naively guess). Many systems are known to satisfy Benford's law. Examples 
include recurrence relations [BrDu], n\ and Cfj (0 < k < n) HDiaL iterates of power, 
exponential and rational maps HBBHllHi2l . values of L-functions near the critical line 
and characteristic polynomials of random matrix ensembles [KonMiJ, iterates of the 
3x + 1 Map HKonMil ILSH and differences of order statistics HMNL to name a few. In 
addition to arising in a variety of mathematical settings, Benford's Law surfaces in 
diverse fields, from atomic physics [P] to biology [CLTF] to geology [NM] to the stock 



market [(Ley). Applications range from detecting fraud in accounting [ |Nigl[|Nig"2| and 



social sciences [Me] to determining optimal ways to store numbers (see page 255 of 
HKnull and HBH10 . See HHilllRaill for a description and history of the subject, and HHul 
for a detailed bibliography of the field. In this paper we show how Benford's law arises 
in chains of probability distributions and hierarchical Bayesian models. This allows 
us to construct tests (based on Benford's law) of certain models. We may interpret 
our results as saying that in many Markov chain Monte Carlo problems, the stationary 
distribution of first digits is Benford's law, and the chain has rapid mixing (i.e., few 
iterations are required to have excellent agreement with Benford's law). 
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Since the early work of Newcomb UNewl and Benford [Ben], there have been nu- 
merous theoretical advances as to why various data sets and operations yield Benford 
behavior. One reason for the immense amount of interest generated by this law is the 
observation that, in many cases, combining two data sets yields a new set which is 
closer to Benford's law (see for example [Ha]). A common example is street addresses. 
If one studies the distribution of leading digits on a long street, the result is clearly non- 
Benford; depending on the length of the street, the probability of a first digit of 1 can 
oscillate between 1/9 and 5/9. However, if we consider many streets and amalgamate 
the data (as Benford HBenH did), the result is quite close to Benford's law. We may 
interpret the above as first choosing a street length from some distribution, so the street 
addresses say are integers in [1, X] for some random variable X. Then for each choice 
of X we study the distribution of the leading digits on that street, and then calculate the 
expected frequencies as X varies. 

In [[KoJ, Kossovsky suggested such an interpretation and proposed that generaliza- 
tions of the above procedure will rapidly lead to convergence to Benford behavior. Ex- 
plicitly, he studied the distribution of leading digits of chained probability distributions, 
and conjectured that as the length of the chain increases then the behavior tends to Ben- 
ford's law. In this note we quantify and prove some of his conjectures; see HKoH for a 
complete description of his investigations. Let T>i{6) denote a one-parameter distribu- 
tion with parameter 6 and density function fade)', thus by X ~ T>i(9) we mean 



We create a chain of random variables as follows. Let p : N — > N. Let X\ = 
X> p (i)(l) and define X m inductively by X m ~ U p ( m )(X m _i). Computer simulations 
and other considerations led Kossovsky to conjecture that if our underlying distribu- 
tions are 'nice', then as n — > oo the distribution of the leading digits of X n converges 
to Benford's law, and further that if X\ is Benford then X n is Benford. Note that our 
example of street addresses is just a special case with a chain length of two and uniform 
distributions. Another way of stating our results is that for certain Markov chain Monte 
Carlo processes, Benford's law is absorbing for the distribution of first digits (and in 
fact the system is rapidly mixing as well). 

We prove his claims in several cases, providing a partial answer to which distributions 
are 'nice'Q Before stating our results, we first discuss some important consequences. 
Returning to our street example, we see we may reformulate it in terms of a Bayesian 
model (see HBerU for more details). In Bayesian models we have some data (say x) 
whose values depend on a parameter (say (3, called the prior). Thus there are two densi- 
ties, that of the data (which depends on /?) and that of the prior. In our situation, x would 
be the street address, drawn from a uniform distribution on say [1, 0\, and then j3 would 
be drawn from some distribution modeling how street lengths are distributed. One can 
of course consider more involved models where the prior depends on a hyperparameter 
drawn from a different distribution (and so on). These are called hierarchical Bayesian 

lr The conjecture may fail if we chain arbitrary parameters of arbitrary distributions. A good test case 
is to consider chaining the shape parameter 7 of a Weibull distribution: f(x) = 7X 7-1 exp(— x 7 ) for 
x > 0. The difficulty with numerics here is that very quickly we end up with a shape parameter very 
small (say less than 1CP 20 ), and thus the numerics become suspect. 




(1.1) 
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models, and in this setting we again encounter chains of distribution, where the number 
of chains is basically the number of levels. 

One of the major problems in Bayesian theory is to justify the choice of the prior. 
Many ideas have been proposed (for example, Jeffrey's prior, conjugate priors, empir- 
ical Bayes, hierarchical models). In putting priors on hyperparameters, we often make 
our prior more "diffuse", so to speak, or less informative. Our main result says that, 
in many cases, a non-informative prior in this hierarchical sense leads to sample data 
closely approximating Benford's Law; further, in many situations a Benford prior might 
be the true non-informative prior, rather than classic approaches which are essentially 
variants on the uniform distribution. Our results can thus be used as a data integrity 
check in this situation. 

We introduce some notation and then state our main results. By Err (z) we mean 
an error at most z in absolute value. Let f(x) be a continuous real-valued function on 
[0, oo). We define its Mellin transform, (Aif)(s), by 

r°° dr 

(Mf)(s) = / f{x)x s -. (1.2) 
Jo % 

Note (M.f)(s) = E[x s_1 ], and thus results about expected values translate to results on 
Mellin transforms; for example, (Aif) (1) = 1 for any distribution supported on [0, oo). 

If g(s) is an analytic function for !SHe(s) G (a, b) such that g(c + iy) tends to zero uni- 
formly as \y\ — > oo for any c G (a, b), then the inverse Mellin transform, (AA~ 1 g)(x), 
is given by 

1 pc+ioo 

(M- 1 g)(x) = — / g{s)x- s ds (1.3) 

Z7T2 J c—ioo 

(provided that the integral converges absolutely). If we set g(s) = (JAf)(s) then 
f(x) = (M.~ 1 g)(x). We define the convolution of two functions fi and f 2 by 

(/.*/,)(») = jf A (f ) m d i = [h (f)/,wf. 0.4) 

The Mellin convolution theorem states that 

CM(/i*/a))(«) = (Mfx)(s)-(Mf 2 )(s), (1.5) 

which by induction^] gives 

(A*(/i*---*/ n ))00 = (Mf n )(s)---(Mf n )(s). (1.6) 

See Appendix 2 of llPal for an enumeration of properties of the Mellin transformJl 
Our main results are the following: 



As (Mf)(s) = E[a; s 1 ], we may re-interpret the following in terms of products of independent 
random variables; see also Remark [23] 

3 If we let x = e 27 ™ and s = a - i£, then (Mf)(a - i£) = 27r/^ (f(e 27TU )e 27nyu ) e~ 27Tl <du, 
which is the Fourier transform of g(u) = 2irf(e 27VU )e 27ra " a . The Mellin and Fourier transforms as thus 
related; in fact, it is this logarithmic change of variables which explains why both enter into Benford's 
law problems. For proofs of the Mellin transform properties one can therefore just mimic the proofs of 
the corresponding statements for the Fourier transform; a good reference is |SS |. 
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Theorem 1.1. Let {Dj (#)}«£/ be a collection of one-parameter distributions with asso- 
ciated densities fvde) which vanish outside of [0, oo). Let p : N — » /, X 1 ~ T> p n\(l), 
X m ~ T> p / m \(X m _i), and assume 

(1) for each m > 2, 



v / r \ dx 



fm(Xm) = / h p(m) (l) I ) /m-l(aCm-l) (L7) 

Jo V^'m— 1 / •^rn— 1 

where f m is the density of the random variable X m (see Lemma \L2\ for examples 
where this condition is satisfied); 
(2) we have 

J?=— oo m- 1 \ ° / 



n— >oo 

<=— oo m=l 



Then as n — > oo distribution of leading digits of X n tends to Benford's law. Further, 
the error is a nice function of the Mellin transforms. Explicitly, ifY n = log B X n , then 

|Prob(T n mod 1 G [a, &]) - (b - a)\ 



< (b-a) 



!=-oo rr> = 1 \ & / 



£= — oo 771=1 
^0 



(1.9) 



//■J is finite and all densities are continuous, then the second condition holds. 

The second condition in Theorem 11.11 is extremely weak, and is typically satisfied 
in all examples of interest. For example, assume / is finite and all the densities are 

continuous. Then for £ ^ we have rapid decay (in £) of (Mfv p{m) (i)) (l — j^jr§) '■> 
this is because our expression is equivalent to taking the Fourier transform of a related, 
continuous function at £/ log B, which by the Riemann-Lebesgue lemma tends to zero 
as \£\ — > oo. With some work, we can construct a pathological infinite family of distinct 
densities where this product condition fails; see flMNJ for the details. Note that for any 
density / we have (M.f)(\) = 1. This is why in (11.8b we sum only over £ ^ 0; the 
£ = term is always 1, and gives the main term term. Frequently this sum tends to zero 
very rapidly with n; we give some explicit examples in $3] 

The first condition is more serious, and thus we give a few non-trivial examples where 
it holds. 

Lemma 1.2. Assume the density fv p r m) (0)(x) = O^ 1 f (x / 9) for some f (with antideriv- 
ative F). Let X m _i have density f m -i and let X m ~ £> p ( m )(X m _i). Then (11.7b is 
satisfied for X m . Examples include 

• Let T> uni f(0) be the uniform distribution on [0,9] (thus fv unif (e)( x ) = l/# far 
x G [0,0] and otherwise); 

• Let T> exp (9) be the exponential distribution with parameter 9 ( thus /x> oxp (e) (x) = 
9~ x exp(— x/9) for x > and otherwise); 

• "^igaussi (9) be the density of\W\ where W ~ N(0, 9/ a/2) (thus /x> |gauss | ($) (x) = 
(2/v / vrP) exp(-(x/9) 2 ) if x > and otherwise). 
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Thus we see that fixing all the parameters but the standard deviation always gives a 
density satisfying the conditions. 

Proof. We calculate the density f m of X m by differentiating the cumulative distribution 
function F m : 



Fm, \^r, 



Prob(X m < x m |X m _i = x m _i)Prob(X m _i = x m - X )dx 



m—l 



x m -i=0 

oo 



/■oo 

/ Prob(X m < i{x m -i)dx. 

J x m _i=0 

fm—1 {%m—l)dx m —\ 



m—l 



£m-l=0 

oo 



dt 



Xm / / 
t=0 \%m— 1 / •^m— 1 



rn 



fm 



x m -i=0 \ x m-l 
oo j 



fm—X\p^m—x)dx 



m—l 



-f 



X 



in 



x m -i=0 %m—l \%rn—l 
oo 



X 



, 1= \%m-l 



fm—l\%m—l)d>Xrn—l 
dXm,—l 



fm—l\ x m—l) 



x m—l 



(1.10) 



□ 



We state two important special cases of Theorem ITTT] 

Corollary 1.3. Let the notation be as in Theorem \l.l\ and assume all conditions there 
are satisfied. 

• Ifp(m) = lfor all m (in other words, if we always use the same distribution), 
then 



Prob(Y n mod 1 G [a, b]) - (b - a) 



< (b-a)- 



f=-oo 



2ml 
log B 



(1.11) 



Let I>Benf,B be the distribution with density 

/Benf.B^) = 



1 



.logs if^e[l,5) 

otherwise. 



(1.12) 



Note if X ~ V-Q enl B then X is Benford base B (this follows by direct integra- 
tion). If V p (i)(l) = £>Bcnf,B then for all n, X n is exactly Benford base B. 

Finally, we give a simple generalization of Theorem ll.il 

Corollary 1.4. Notation and conditions as in Theorem \l.l\ for each m > 1 letrim) be 
a non-zero integer. Let now X m ~ "D p ( m )(X^i ). Then the results of Theorem \l.l\ 
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still hold, except now |Prob(y n mod 1 6 [a, b]) — (b — a) \ is at most 



(6-a). £ n^Wi)) 1 



C=—oo m=l 
1^0 



2irir(m)£ 
log 5 



(1.13) 



Remark 1.5. In Corollary \1.4\ we could take r{m) G Q — {0}, and the proof would 
follow similarly. We chose to take r(m) 6 Z - {0} as then the claim in Corollary U 3\ 
also holds. 



We prove our main results in ^2} and comment on some alternate interpretations of 
our results. In particular, we show we may interpret our results in terms of the dis- 
tribution of products of independent random variables, which has been connected to 
Benford's law by many authors (see the description and references in HMNI for addi- 
tional details). 

One of our goals in this work is to demonstrate the ease of using the Mellin transform 
to obtain rapidly converging estimates on deviations from Benford's law. To this end 
we give some examples in §[3]where we only use one distribution in the chain, obtaining 
very rapidly converging (in n) bounds. 

The proof of Corollary 11.41 follows from Theorem 11.11 and a lemma on the Mellin 
transform of the density of X^™^, which we give in Appendix lAl This is but one of 
many possible generalizations which can readily be studied using our methods. 

Our results immediately apply to the situation of hierarchical Bayesian models with 
each variable depending on just one other variable. Thus we have established a con- 
nection between this field and Benford's Law. In particular, we see that when there are 
many levels then the observed sample values should approximately follow Benford's 
law, and thus these simple digit frequency tests can be used to test some detailed as- 
sumptions about hierarchical Bayesian models. In practice there is excellent agreement 
with Benford's law even when there are few levels; see the examples in §[3] for explicit 
bounds from uniform and exponential chains as well as examples where such chains 
may arise. In future work we plan to explore the case of chaining several variables, 
in order to handle the most general situations; for example, in addition to varying the 
scale, we will investigate the effects of changing the shape parameters of a distribution 
(such as the exponent in a Weibull family). 



2. Proof of Theorem I1.1I 
We first prove Theorem ll.il and then show how Corollary \L3\ follows. 

Proof of Theorem 177/1 We first calculate f n , the density of X n . The basis case is clear, 
and for the inductive step we note 

/nW = / /%(„)(!) ( ) fn-liXn-l)^ 1 ^ 1 = {fv p{n) (1) * fn-l) (x n ) . (2.1) 

JO \ x n-l / x n-l 
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By the Mellin convolution theorem and induction we have 

(Mf n )(s) = (M/iWi)*/„-i))00 

= (Mf Vp{n)(1) )(s).(Mf n ^)(s) 

n 

= \{{Mh p{m){1) ){s). (2.2) 

m=l 

By the Mellin inversion theorem we find 

/»(*») = M[[(^(i)(-))JJW. ( 2 -3) 

To investigate the distribution of the digits of X n (base B) it is convenient to make a 
logarithmic change of variables. Thus set Y n = \og B X n . We have 

Prob(F n < y) = Prob(X„ < B y ) = F n {B y ). (2.4) 

Taking the derivative gives the density of Y n , which we denote by g n (y)'- 

g n (y) = f n {B»)B" log B. (2.5) 

A standard method to show X n tends to Benford behavior as n — > oo is to show that 
Y n mod 1 tends to the uniform distribution on [0, 1] (see for example HDia[IMT-"Bl ). This 
can be seen from the following calculation. The key ingredient is Poisson Summation. 
While the argument is similar to that in [KonMiJ, the resulting expressions are not in the 
form considered there, and we thus cannot simply quote their results (though a trivial 
modification of that argument suffices). Let h n>y (t) = g n (y + t). Then 

oo oo oo oo 

Yl 9n(y + £) = Y h ">yW = E = E e ^n(0, (2-6) 

£=— oo £=— oo £=— oo £=— oo 

where / denotes the Fourier transform of /: 

/oo 
f(x)e~ 2 ^dx. (2.7) 
-oo 

Letting [a, b] C [0, 1], we see that 

°° pb+e 

Prob(F n mod 1 G [a, 6]) = Y 9n(y)dy 

0- „Ja+t 



£=~oo 

Y 9n(y + £)dy 

£=-OD 

b 00 

Y e 2mye g n (i)dy 

a £=~oo 



b-a + En ((b-a)Y\Mt)\ J ■ ( 2 -8) 
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Note that since g n is a probability density, g n (0) = 1. The proof is completed by 
showing that the sum over i tends to zero as n — > oo. We thus need to compute 'g n (jt)\ 



9n(0 = / g n (y)e~ 2m ^dy 

J — oo 

/oo 
f n {By)By\ogB-e~ 2 ^dy 
-oo 
oo 

fn(t)t- 2 ^ /l ° gB dt 
m=l ^ 6 / 

Substituting completes the proof. □ 

Remark 2.1. Tf/ is a continuous density function, then (A4f) (l — < 1 0- 

TTzz's is because f(x) is non-negative and 

{Mf) i 1 - = r / w e ~ 2 ^ iogs< ^ (2 - io) 

note the integral is clearly at most J °° f(t)dt = 1 (since f is a density) and in fact 
is less than this because of the oscillation due to the exponential factor. As |£| grows 
this integral tends to zero rapidly. This follows from our assumption that the Mellin 
transform is a nice function, and indicates that we have rapid convergence if all the 
distributions in the chain are equal. An alternate proof of the decay in |£| is to note 

that(Aif) (l — T^r§) is the Fourier transform of g(u) = f(e u )e u at £/log£>, and this 

tends to zero by the Riemann-Lebesgue lemma. 

The above proof suggests the following: 

Corollary 2.2. Let a be a permutation ofN (thus a is a 1-1 and onto map from N to 
N). Assume all conditions in Theorem U .l\ hold for both some map p : N — > N (with the 
chained random variables X m ) and p o o : N — > N ( with the chained random variables 
X m ). If{p(l), . . . ,p(n)} = {p(cr(l)), . . . ,p(a(n))} then the density of X n equals that 

0fXn- 

Proof. The proof is immediate, and follows from the commutativity of multiplication 
in the expansion for the density /„ in (12.31) . □ 

Remark 2.3. The proof of Theorem 17.71 suggests another interpretation. Namely, the 
density of X n is exactly that of the density o/Hi • • • H n , where the H m are independent 
random variables with E m ~ Vp^il). For example, the density of the random variable 
Hi • H 2 is given by 

f°° fx\ dt 
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(the generalization to more products is straightforward). To see this, we first calculate 
the probability that Hi • H 2 G [0, x] and then differentiate with respect to x. Thus 

coo 



t=0 

oo 

t=0 



Prob(Hi ■ H 2 G [0,x]) = J ^Prob (h 2 G [o, |]) f Vp{1){1) (t)dt 

Wi)(f)/W)(^- ( 2 - 12 ) 
Differentiating gives the density of Ex ■ H 2 , which equals 

f°° j, fx\ dt 

J (j ) Jv pm {i)(t) — . (2.13) 

Thus the convergence to Benford behavior of X n is equivalent to the convergence to 
Benford behavior of the product of n identically distributed random variables. This 
is basically the central limit theorem for random variables modulo 1 ( see for example 
HMNI ). and thus yields an alternate proof of this important result (at least in this special 
case). Note this also gives another explanation for Corollary \2.2\ 

Proof of Corollary U. 3\ The first part follows immediately from Theorem ll.il For the 
second claim, we need the Mellin transform of /ecnf b- 



poo 

{Mf B cnf,B){s) = / /Benf,B(^)^ 
JO 

oc doc 



dx 

x 



1 ' B 



1 if s = 1 

log B s — 1 ' 



(2.14) 



Thus 



(KAf \(^ 2mi \ J 1 if £ = 

= \0 ifO*/EZ. (2 - 15) 

Earlier we showed 

n 

(Mf n )(s) = YLiMfv^vKs). (2.16) 



m=l 



We are assuming that V p ^(l) = T>-Q en{B , and thus when we evaluate at s = 1 — ^| 
with f 6 Z, the only term which survives is when £ = 0. From the proof of Theorem 
ll.ll we have 

Prob(F„ mod 1 G [a, b}) = b - a + Err ( [b - a) ^ \g n {£)\ J , (2.17) 



where F n = log B X n and 



UZ) = (Mf„) ( 1 - ; (2.18) 
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note X n is Benford base B if and only if Y n mod 1 is the uniform distribution. As 
g n (f) = if 7^ £ G Z (from evaluating the Mellin transform of fx), we obtain that 

Prob(K„ mod 1 G [a, b]) = b - a; (2.19) 

thus X n is Benford base B for all n. □ 

Remark 2.4. ./Vote that, unlike the other theorems, we have Benford behavior for a 
finite value ofn; there are no error terms. Further, by Corollary \2.2\ we obtain that X n 
is exactly Benford base B if for some m < nwe have X m ~ £>Bcnf,B(^m-i)- 

3. Examples 

We give two explicit examples of the types of rapidly converging error estimates 
easily obtainable from these methods. The first example is chaining exponential distri- 
butions. Many processes have wait times governed by a Poisson or exponential distri- 
bution; thus applications of these results could be to more involved processes where the 
wait time parameter depends on another process. For our second example we consider 
chaining uniform distributions. Our street example gives one instance where this could 
arise, namely when we choose uniformly among options of varying size. 

3.1. Chains of the Exponential Distribution. Let X\ ~ Exp(l) (the standard expo- 
nential distribution) and X m ~ Exp(X m _x), and set Y m = log B X m . By Theorem ll.il 
we know that as n — > oo the distribution of digits of X n tends to Benford's law; we now 
bound the error term. We need the following two ingredients: 

• the Mellin transform of the standard exponential function (which we denote by 
/ exp ) is the Gamma function: 

/•oo 

/ exp(-x)x s " 1 c/x = r(s). (3.1) 
Jo 

Thus 

• for real x, 

\T(l + ix)\ = \j ttx/ sinh(7rx). (3.3) 
Substituting these into Theorem II .l| (or Corollary 1 1.31 ) gives 



Prob(F n mod 1 G [a, b]) = 6 - a + Err [ (£> — a) Y| — 

\ „ V sii 



(3.4) 



i sinh(27r 2 ^/log J B; 
or equivalents/ the probability that the mantissa of X n is in [1, s] is 

As sinh(x) grows exponentially in x, we see the above sum converges rapidly (i.e., the 
large £ terms are immaterial), and the error term decreases rapidly with n. 

If we take B = 10 we find the difference between the probability of observing the 
mantissa of X n in [1, s] and the Benford probability of log B s is at most .0033 \og B s if 
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n = 2, .00019 log B s if n = 3, .00001 1 \og B s if n = 5 and 3.6 • 10~ 13 log B s if n = 10. 
If P = 10 then for all I > 1 we have exp(2vr 2 £/log 10) - exp(-27r 2 £/ log 10) > 



10000 



m001 ' exp(2ir 2 £/ log 10). Thus the error term is bounded by 

3.2. Chains of the Uniform Distribution. Let X 1 ~ Unif(0, k) (without loss of gen- 
erality we may assume k e [1,10)) and set X m ~ Unif(0, X m _i). If P n (s) is the 
probability that the base 10 mantissa of X n is at most s, then 

/ n A (logA;)™- 1 / 1 C(n) - 1\ \ 

P„(a) = log 10 S + Err - l * ' + — + - 2 log 10 s . (3.7) 



T(n) V 2 - 9 " 2 - 7 
As the uniform distribution is so easy to work with, we sketch an alternate, more 
explicit derivation; in fact, it was by generalizing this and the exponential case (which 
involved properties of the Meijer C7-function) that led us to the proof of the general 
case. One can prove by induction that 

f(x) = ^ n (k/x n ) 

tn{Xn) kT{n+1 y V*> 

For the base case n — 2, since X 1 ~ Unif (0, k) we have 

rk 

F 2:k (x 2 ) = / Prob(X 2 < x 2 \X 1 = x 1 )Prob(X 1 = x 1 )dx 1 
Jo 

f X2 dr-> f h dr, 

= / Prob(X 2 <x 2 |X 1 = a ; 1 )^ : i+ / Prob(X 2 <x 2 \X 1 =x 1 )==± 
Jo k J X2 k 

dxi f k x 2 dxi 



- 1:^1 



Xi k 



k k 
Differentiating yields 



x 2 + x 2 \og(k/x 2 ) 



\og(k/x 2 ) 

h,k{ x V = £ , (3-10) 

which proves the base case. The inductive step follows similarly. 
We have 

00 i-s-10- e /■min(s,fc) 
P n(s) = S~] / fn,k(x n )dx n + / fn,k( X n)dx n . (3.11) 

~p[Jio-' Jl 

Note for large n the contribution from the second integral is negligible, as the integrand 
is bounded by (log k) n ~ l /(n — 1)!, which tends rapidly to for fixed k and increasing n. 
We change variables by letting u = \og(k/x n ). Thus du = —x^dxn or dx n = ke~ u du. 
Thus if we set 

f e~ U if M >0 

9n(u) = r ^ " (3.12) 

if u < 0, 
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we find that 

00 /"log k+£ log 10 /"log k— log(min(s,fc)) 

p n(s) = ^ / 9n(u)du - / g n (u)du, 

£__ co t/logfc+£loglO~logs </ log fc— logs 

(3.13) 

where = for k < allows us to extend the ^-sum to all integers. The contribu- 
tion from the second integral is negligible, as it is bounded by ^ v L% — . We evaluate 
the main term by Poisson Summation. Thus 

00 /.logfc+^loglO /l/U^n-l' 



Hogfc+^loglO /jL /1 Un-lX 

P»W = E / ^(t*)d« + Etr(*fi^L-) 

^^./logfc+^loglO-logs V s 1 W / 

= f)^(tt + ^glO)dr* + Err^^*)r^ > ) 

^logA:-logs^ = _ 00 V s 1 l n J / 

= / l0Sl ° " V ((w + *) log 10) d W + Err (-^^) 

</log 10 fc-log 10 a £= _ oc V s 1 W / 

/■logio fc J!fL / 1. fl np - \ 

= / E w lo s 10 dw + Err - w \ > < 3 - 14 ) 

where h n;W (t) = g n ((w + t) /T) with T = 1/ log 10. We have written our sum like this 
to facilitate applying the Poisson Summation formula. We have 

00 / n\ 00 00 00 

E 9n = E km = E Km = T E z M UTt)- 

£=—oo ^ ' £=—oo £=—00 £=—00 

(3.15) 

Recall that g n (u) is the density function for the Gamma distribution with parameter 
n. Its characteristic function is well-known to be E[e 1 *] = (1 — it)~ n ; thus its Fourier 
transform (which is E[e~ 27r4i ]) is just g n (t) = (l+27rzt)~ n . Therefore substituting (13.151) 
into (13.141) and splitting off the contribution from £ = yields 

„ fk (log k) n ~ l \ ^ , . , / 2ttz£ V 

= l0& ° s + Err Krwr ) + E v 1 + 

(3.16) 

The error term is easily analyzed. The contribution from £ = 1 is bounded by 
(2.9)~ n 2 log 10 s, while the t > 2 terms contribute at most 

^^■W-D, 0,7) 

where 

00 1 

= EtI' We(«)>l (3-18) 
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is the Riemann zeta function. Thus 

p. w = logloS + Err 0(!^ + (^ + «^) 21ogloS ),, 19) 
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Appendix A. General Powers of Random Variables 

Our main theorem considers a chain of random variables, where X m ~ X> p ( m ) (X m _ x ) . 
Our proof uses properties of the Mellin transform, and shows the equivalence of chain- 
ing to products of random variables. 

More generally, for each m let r(m) be a non-zero integer. We consider now X m ~ 
P p ( m )(X^_i ). Our theorems generalize immediately to this case as well. The key 
ingredient is the following lemma. 

Lemma A.l. Let W have density (ft, and for r e Z — {0} let U = W r have density ip r . 
Then 

(Mil> r ){s) = (M<f)) (r(s - 1) + 1) . (A.l) 
In particular, taking s = 1 — ^| yields 

<«*>(i-S)-H»-K)- 

Proof. We calculate the cumulative distribution function of U, and then differentiate to 
get its density. We consider r > (the case of r — —\r\ < is handled similarly). We 
have 

* r (u) = Prob(C/ < u) = Prob(W r < u) = Prob (W < u 1/r ) = $(w 1/r ),(A.3) 
where $ is the antiderivative of <f>. Thus 

^ r (u) = -(f) (u 1/r ) u 1 ^ . (A.4) 
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We now calculate the Mellin transform, again considering just the case of r > 0, as the 
other case follows similarly. 

ip r (u)u 



(Mlp r )(s) 



oo ^ 

r 

oo 



,du 



U r U 



u 



{M4>) (r(s - 1) + 1) ; 



10 t 

the remaining claim follows by direct substitution 

Remark A.2. For us, one of the most important consequences of Lemma IA.il is that 
when we evaluate the resulting Mellin transform at 1 — we end up with the Mellin 

transform of another density evaluated at 1 — ^^^g 1 ^ . Thus our arguments from 
before follow with almost no change; it is essential that the effect of replacing X m _i 



(A.5) 



□ 



with X^™i^ is only to change the imaginary part of where we evaluate. We could take 
r(m - 1) 6 Q — {0} or even K — {0} and the argument would still hold (but now the 
second part of Corollary U 3\ would fail). 
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